Uber vs Bikes in Manhattan - A study with Python


image.png

Welcome to my Manhattan Mobility Study! We are going to take dive into the Big Apple and try to get a better understanding of two of its forms of transportation: a Bike Sharing System and Uber Rides.

How are each of them used throughout the day? Is there a big difference in their use between weekends and week days? Which are the favorite districts for cyclists? How is the dynamic of a neighborhood very popular among tourists? What about local workers?

We are going to try to answer some of these questions using python, with a little help from folium, a very powerful library that will help us make some beautiful maps! You can see the full project in my github.

Collecting the Data


We are going to collect data from 3 different subjects: bike trips, uber rides, and the shapefile with the neighboorhods limits of Manhattan.

1. Bike Trips - Citibike Website

The bike trips data were collected from the website of citibike, the most important bike sharing system in New York. We'll analyze over 1.4 million trips, and we'll have a lot of information, such as duration, departure and arrival station, which plan the user had, his age, etc. 

2. Uber Rides -  FiveThirtyEight github

We'll be able to explore over 700.000 Uber trips, thanks to the FiveThirtyEight portal, which has some very interesting datasets and studies. This data was obtained from the NYC Taxi & Limousine Commission (TLC) through a request, supported by the Freedom of Information Law.

3. Neighborhoods limits - NYC Open Data website

The NYC Open Data website contains a lot of useful information about the city, that are provided and maintained by agencies and the city office. We can find data about education, business, environment, city landmarks, health, you name it… it is even possible to find the census data of squirrels in Central Park.

In this website, we were able to download the shapefile of the neighborhood limits, that will be very helpful in our spatial analysis.


We'll start with the bike trips

We have 716 rows without the origin/destin references. This is a very small number, let's just remove them.

Now, let's read the data from uber trips.

That's great, we don't have null values from uber trips!

Before moving on, there's one more thing we need to do:

We need to filter trips that started in Manhattan. With our current datasets it would be impossible to do that, since we only have the latitude/longitude references.

Luckily, we have a shapefile that contains the limits of the city and its neighborhoods, and not only it will allow us to filter all the trips started in Manhattan, but will also help us a lot in our geospatial analysis session.

EDA


Before we start, let me to give an important note: The bike and uber trips are both from September, but from different years. That fact should always be taken into account when observing some proposed comparisons, and I invite you to always look at them with a critical sense.

For example: When analyzing the numbers presented, it would be inappropriate to conclude that the number of trips by bike is 2x greater than by uber, since the absolute number of trips of the two modes certainly varied a lot in 4 years.

However, it seems fair to use the premise that the dynamics of the city and some user behaviors have remained: the time that people go to work must not have changed abruptly. Popular districts among uber users will be the same in 2014 and 2018, etc. Using that premisse, we can make a lot of interesting analysis.

DAILY TRIPS OVER TIME

Let’s see the number of daily trips over time. Since we are analyzing different periods, the idea here is not to compare the absolute numbers, but the overall behaviour: are the number of trips somewhat stable, or do we have very high/low peaks? Do we have a growing pattern in our data? What about a weekly seasonality?

Looking at both curves, we can’t see a growing pattern. Also, as we might expect, the number of bike trips have some very low peaks, while the uber trips are more stable. That makes sense, since bike trips can be very sensible to bad weather.

Furthermore, the uber rides seem to have a strong seasonality, that is harder to spot in the bike data. Let’s dig a little deeper.

Trips per Day of Week and Type of Day

Key takeaways

Let’s take a look at how these trips are divided throughout the day, to get a better understanding on why they happen.

TRIPS PER HOUR

Now we are going to see when the trips happen through the day, breaking down by mode and day type. That might help us to understand the dynamic of the city and how new yorkers rely on each system: when people go to work, when they go home, what mode they use in these situations, whether they make a lot of trips outside the peak hours, etc.

Key takeaways

Okay, now that we have a basic understanding of how New Yorkers use each mode, let's delve into the subject through a set of geospatial analyzes.


Geospatial Analysis

Now we'll try to find patterns by looking at the districts in the city. What are the most popular neighborhoods for cyclists? These neighborhoods are also pouplar among uber users? How is the dynamics of neighborhoods with many tourist attractions? Popular districts on weekends are the same compared to weekdays? Which neighborhoods have the busiest nightlife?

We are going to generate some maps with a little help from the folium library.

Bubble Map

First of all, let's take a look where the citibike stations are placed. To do that, we'll plot a bubble map, where each bubble will indicate the location of a station, and the size will represent the number of trips initiated in each station.

We can see that:

Heatmap

From the previous map, we could get a good idea on how the stations are distributed, and where are the most popular regions among cyclists. For uber rides, we can't use the same kind of map, since the trips can start from basically anywhere. So, we need to use another kind of representation to analyze uber trips. We'll use a heatmap!

We can see some "hot spots" for uber trips, indicated by the red zones in the map: 

In the next steps, we are going to generate a new set of maps to try to visualize some metrics over the city. To do that, we'll use a Choropleth maps from now on.

Choropleth

Choropleth maps are a great way to visualize how a certain measure varies across geographic units. Each region (in our case, the neighborhoods) will be represented with a slightly different color according to the intensity of the variable we are analyzing. As colors tend to vary with a linear scale, along with the intensity of the variable, these kinds of maps are very powerful to highlight patterns in our data.

First, we'll just create the maps just with the number of trips: First breaking down by modal, and then breaking down by mode + type of day.

Uber Trips - Total

Bike Trips - Total

Uber Trips - Weekday

Bike Trips - Weekday

Uber Trips - Weekend

Bike Trips - Weekend

Key takeaways

Week vs Weekend


First, an asterisk: In the next analysis, we'll be joining bike trips and uber trips in the same map. As we mentioned, the trips from each modal have a four year difference, and since a lot can change in that period, I'd rather not to have a modal with a greater influence on the result than the other based on the absolute number of trips. Therefore, it will be made an adjustment to make sure that bike trips and uber trips have the same weight.

Until now, we focused a lot on the comparassion between uber and bikes. Now, we'll turn our attentions to understanding the difference between weekdays and weekends.

Let's generate a map with the weekend index, that is basically the average of trips from each neighborhood on weekends divided by the same metric on weekdays. A high weekend index indicate a district with a lot of its trips on weekends. Note that the index itself is not relevant, since we are mixing values with a 4 years difference. What matters the most is the color contrast between regions.

We could see a couple of interesting things:

Period of the Day


Now, we'll do a similar exercise looking at the time of the day (morning, afternoon or night). We'll generate 3 maps, that will be colored base on an index that represents the % of trips that started in that given period. For example: The morning index for the Upper West Side is 37.7. That means that 37,7% of the trips initiated in that neighborhood started in the morning.

Morning Trips

Afternoon trips

Night Trips

We could see a couple of interesting things:

Timelapse

To wrap it up, let's create a time lapse of the trips in Manhattan, this will give us a full and more complete view on how both modes are used. For example: The neighborhoods that are popular at 1 p.m, might not be the same that are popular at 5 p.m, and in our previous maps, we coudn't see that with that level of detail.

We'll create 4 gifs: Bike Trips at Weekdays, Bike Trips at Weekends, Uber Trips at Weekdays and Uber Trips at Weekends.

Creating Gif